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Abstract 

We examine how three different communication processes operating through social 
networks are affected by homophily - the tendency of individuals to associate with oth- 
ers similar to themselves. Homophily has no effect if messages are broadcast or sent via 
shortest paths; only connection density matters. In contrast, homophily substantially 
slows learning based on repeated averaging of neighbors' information and Markovian 
diffusion processes such as the Google random surfer model. Indeed, the latter pro- 
cesses are strongly affected by homophily but completely independent of connection 
density, provided this density exceeds a low threshold. We obtain these results by es- 
tablishing new results on the spectra of large random graphs and relating the spectra 
to homophily. We conclude by checking the theoretical predictions using observed high 
school friendship networks from the Adolescent Health dataset. 
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1 Introduction 



How does a society's structure affect the speed at which information diffuses within it? In 
particular, how do segregation patterns in a social network affect how information is diffused 
and aggregated within that society? How does that relationship change as we vary the 
communication process? In this paper, we take a step toward answering these questions 
by studying how information transmission is affected by homophily, an almost universally 
observed feature of social networks. 

Homophily - the tendency of individuals to associate with those similar to themselves - 
has been observed since anti quit'vFI and studied intensively by sociologists under that name 
since Lazarsfeld and Merton (Il954l ). It has been documented across a wide array of different 
characteristics, including ge, ethnicity, profession, religion, and various behaviors. 

Indeed, homophily is one of the most perva sive and robust tendencies of social networks (see 
McPherson, Smith-Lovin and Cook (120011 ) for a survey). 

Learning and diffusion processes in networks have also been the focus of many recent 
studies in economics and related fields§] These derive results about the convergence, speed, 
and/or accuracy of various communication processes, which include the diffusion of informa- 
tion, the formation of consensus, and other forms of learning and communication that are 
integral to social and economic behaviors. 

Despite the activity in these two related fields, there has been effectively no modeling 
of the impact of homophily on learning or diffusion processes. In this paper, we address 
this gap by modeling both homophily and communication explicitly and using the model to 
examine how homophily affects communication in various settings. 

We use a probabilistic model of homophily that we call the multi-type random network. It 
generalizes the seminal Erdos-Renyi random network model and nests several other models 
as well. In our model, agents are divided into different types, and then links are formed 
independently between various agents, with the probability of a link forming between two 
agents depending on the types of the agents involvedjf] Once the network is generated, it 



^By Plato's time, homophily was already considered proverbial. In The Republic, Cephalus says: "For it 
often happens that some of us elders of about the same age come together and verify the old saw of like to 
like" (Book I, p. 329). 

^Se e, for instance, Ellis on an d Fudenberg (|l995r ). Bala an d Goy al (|l998r ). DeMarzo, Va yanos, and Zwiebel 
(|2003[ ). Gale and Kariv (|2003|). Ba neriee and Fudenberg (|2004[ ). Golub and Ja ckson (|2007l ). Acemoglu, 
Dahleh, Lobel and Ozdaglar ( 2008al ). and Acemoglu, Nedic, and Ozdaglar ( 2008bl ). 

■^The probabilities governing the linking may arise in various ways - through a process of choices, or 
through differential opportunitie s for meeting various types, or some combination; this is modeled explicitly 
in Currarini, Jackson, and Pin (|2009l) . Here we abstract away from these issues and take the structure of 
linking probabilities as given. 
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remains fixed as learning or diffusion occurs. We consider three different processes. Whether 
or not the homophily in the network structure affects the speed of the learning or diffusion 
turns out to depend on the type of learning or diffusion process. 

The first process we study is one in which information is either broadcast or navigated 
to its destination via shortest paths. This class includes many peer-to-peer systems like the 
Internet, mechanisms where messages are routed within an organization using something 
like an organizational chart, and information spreading phenomena where people tell an 

important piece of news to everyone they knoW; The second is a proc ess based on linear 

updating or learning, first modeled by French (119561 ) and Harary (119591 ). where individuals 
update their beliefs or actions by repeatedly taking weighted averages of their neighbors' 
beliefs or actions. This captures boundedly rational processes of updating as well as pressures 
to conform or a desire to match actions of neighbors0 The third is a random walk process 
on a network, where some particle hops around the network, having equal probability of 
moving along any link out of its current node; one example is Google's famous model of a 
surfer who randomly follows a link out of the website he is currently visiting. These three 
processes encompass many important forms of network-based communication and diffusion. 

Our main results show that whether or not homophily has an impact depends on the 
communication process, and we detail precisely how homophily matters when it does. In 
particular, we show that processes based on shortest paths are unaffected by homophily, 
while averaging processes and random walks are affected and can be substantially slowed 
down by it. The reason that homophily does not affect average shortest path lengths is 
that even with substantial homophily and the resulting clustering, the number of vertices 
that can be reached in t steps is exponential in t. Homophily may change who is close 
and who is far from a given agent, but does not change the expected distance between two 
random nodes in the network. In contrast, processes based on weighted averaging or random 
walks are substantially slowed down as homophily increases: even though the average path 
length is unchanged, there are relatively fewer paths between agents of different types as 
homophily increases. This means that a node is more influenced by others of its own type, 
which reinforces global heterogeneity in beliefs or behaviors and slows down convergence to 
a steady state. 

In contrast, increasing link density speeds up shortest path communication, but has no 
effect on the speed of linear updating processes once density exceeds a low threshold. The 
first point is clear: adding links can only reduce distances, and will often do so dramatically 



''Such updating processes can lead a society to an optimal aggregation of informati on in some settings, 
depending on the specifics of the social network structure (e.g., see Golub and Jackson (|2007l l). 
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by creating "shortcuts" which allow fast traversal of the network. To see why increasing link 
density in a homophilous network does not reduce delay in reaching consensus, consider a 
network in which a typical agent has nine friends of her own type and one friend of a different 
type. Suppose such an agent is in an island that disagrees with the rest of the network. At 
every step, she is pulled toward the global consensus somewhat by her outside friend (10% 
of the influence on her beliefs) but pulled strongly back by her own island. Thus, consensus 
is slow. Now we double the number of links but hold the homophily fixed. So the typical 
agent will now have eighteen friends of her own type and two of a different type. As before, 
only 10% of her beliefs will be coming from outside, so she will be pulled toward consensus 
equally slowly. 

Table 1: A qualitative summary of our main results on how communication speeds in the 
processes we study are affected by homophily and link density. 

Independent Variable 

Density Homophily 

^ Shortest Path t 

o 

1^ Linear Updating ^ , 

and Random Walk 



A qualitative summary of these relationships appears in Table [H An interesting implica- 
tion is the following. Consider a model where agents form links to others t hrough a random 
search process, such as the one discussed in Currarini, Jackson and Pin Suppose 
that we consider a change in the matching technology - such as the introduction of social 
networking software - so that it becomes easier to search for agents of one's own type. If 
agents have some preference for connecting to agents of their own types, this would lead to 
an increase in the overall density of links in the network, but would also cause the typical 
agent to spend a larger fraction of links on agents of the same type. What would the ultimate 
impact be in terms of communication? Our results imply that shortest-path based commu- 
nication would become faster, but at the same time, Markovian processes of communication 
such as the linear updating process would converge more slowly! 

As an empirical illustration of the results, we examine the time to convergence of these 
processes on social networks from over eighty different high school friendship networks that 
exhibit varying degrees of homophily. We show that the predictions of our analysis fit well 
and that the speed of convergence depends on homophily in the predicted ways. 
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Conceptual Outline 

Our results involve several layers, including some contributions on the mathematical side 
which are needed to deduce the relationships discussed above. In particular, in order to 
relate network structure (including homophily) to convergence speeds we have to work with 
the spectral decomposition of a network, which naturally leads us to develop new results on 
the spectra of large random graphs. Given this layering, it is useful to have a road map of 
how all of our results fit together. 

After introducing the model and background definitions, we first present the main concep- 
tual results that relate homophily to speed of communication. Having the main conclusions 
in hand, we then present a series of results that are used to derive those conclusions. In 
particular, we begin with a statement of standard results relating speed of convergence to 
second eigenvalues. Next, we present our key technical theorem, showing that the second 
eigenvalue associated with a random network will be close to the second eigenvalue of a 
smaller matrix which deals only with relative linking probabilities across types. That is, all 
that really matters in determining the second eigenvalue in large societies is the expected 
connection probabihties between types. This result allows us to derive second eignvalues for 
the multi-type random networks based on homophily patterns and relate the eigenvalues to 
simple measures of homophily. Thus, the key technical result allows us to tie homophily to 
second eigenvalues, which in turn govern consensus and mixing times. Putting all of this 
together provides our conceptual conclusions. 

2 The Model: Networks and Processes 
2.1 Networks 

Given a set of n nodes A^ = {l,...,n}, a network is represented via its adjacency matrix: a 
symmetric ra-by-nmatrix A with entries in {0, 1}. The interpretation is that Aij = Aji = 1 
indicates that nodes i and j are linked, and we restrict attention to undirected networksjf] 
Let di{A) = J2]j=i^ij denote the degree of node i. Let (imin(A) and (imax(A) be the 

^Although wc conjecture that the results can be extended to directed networks without much change in 
the statements (as the communication/learning processes have direct extensions to the directed case), there 
are parts of the proofs that take advantage of the symmetry of the adjacency matrix, and so we are not sure 
of what modifications would ensue in examining directed networks. 
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minimum and maximum degrees, respectively, and d{A) denote average degree, and let 

i 

be the total degree in the society. 



2.2 Mult i- Type Random Networks 

In order to study the impact of homophily on communication and learning through a network, 
we introduce a random network model that incorporates homophily. The seminal random 
network model of Erdos-Renyi random networks is a special case of the mo del he re (and 



the same is true of the model based on degree distributions of Chung and Lu (120021 )) which 
allows us to make benchmark comparisons to the literature on speeds of processes on networks 
without homophily. 

The structure we use to model homophily is what we call the multi-type random network^ 
It consists of a vector n = {rii, . . . , rim) which captures how many nodes of each type there 
are (and implicitly, how many types, m, there are), and a symmetric m-by-m matrix P, 
whose entries in [0, 1] describe the probabilities of links between various types. Let Nf^ be 
the set of nodes of type k, and without loss of generality label nodes so that {1, . . . ,ni} 
are the nodes of the first type, {1 + rii, . . . ,ni + 712} are the nodes of the second type, and 
iVfc = {1 + X]i<A: '^i' • • • ' Sj<fc nodes of the k-th type. The resulting random 

network is captured via its adjacency matrix which is denoted A(P,n) and is a random 
variable. In particular, A(P, n) is built by letting the entries Aij with i > j he independent 
Bernoulli random variables with parameter Pke ii i & Nk and j G Ni. That is, the entry Pkt 
captures the probability that an agent of type k links to an agent of type I. We then fill in 
the remaining entries of A by symmetry: A^j = Aji. 

Here are some special cases of the model. 

If Pke = p for all k, i, then this is simply an Er dos-Re nyi random network. 

The random network model of Chung and Lu (120021 ) is the special case where the only 
heterogeneity in liking is induced by expected degrees. In particular, each type has an 
expected degree Wk, and Pki = WkWi/W where W = '^f^UkWk- Thus, the matrix P is 
reduced from having m(m + l)/2 degrees of freedom to having m, namely the expected 



^This can be seen as a variant on statistical models that have been used t o captu re homophily in networks, 
such as various p* models (e.g., see the references and discussion in Jackson (l2008bl) '). There are also versions 



of it in the computer science literature called the planted multisection model, e.g., McSherry (j2001r ). and in 
the community detection literature, e.g., Copic, Jackson and Kirman (|2005i) . 
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degrees of the types|l| 

A spatial model is one where each type of node has a parameter 9^ G for some m 
that describes it, and Pm = f{A{9k, Of)) where / is a decreasing function and A is Euclidean 
distance. This the probability that nodes link to each other is a function of how similar their 
types are. 

2.2.1 The Islands Model 

Another special case of the model that we discuss in some of the results below is an islands 
model. There we assume that all rii are equal, so that islands are equal-sized; we set Pkk = Ps 
for all k and Pke = Pd for all k ^ i. Thus, nodes of the same type connect to each other with 
one probability, and nodes of different types connect to each other with another probability^ 



These examples are only a few of the possibilities, and clearly one can consider com- 
binations of these variations, and other considerations such as special cases where linking 
probabilities are built on some hierarchy, etc. 

2.2.2 Remarks on When the Multi-Type Random Networks Model is Useful 

These examples give an idea of how rich the multi-type random networks model is. How- 
ever, as with any model, it is most pointed in its predictions when we obtain a significant 
reduction in the dimensionality of the problem. In particular, for our main results involving 
representative agents to be most useful, it is helpful for there not to be too many types or, 
failing that, for the interaction between types to be described by only a few parameters. 
Effectively, the results reduce the problem of working with a network of n individuals to a 
simpler problem of working with m types. If there are as many types as individuals, then 
clearly that will be unhelpful without additional assumptions; however, a good deal of ex- 
planatory power comes out from looking at just a few types, to the extent that a few types 
capture most of the important variation in the data. As we will see from our look at the 
data, very simple definitions of types have substantial explanatory power. 



''Of course, if there are as many possible expected degrees as agents, then P is the same size as the 
adjacency matrix of the network, which makes the model less tractable than when there are a few types; in 
empirical settings, usually the number of permitted expected degrees is small compared with the size of the 
network. 

^See Currarini, Jackson and Pin ( 20091 ) and Copic, Jackson and Kirman ( 2005 ) for illustrations and 
applications of such a model. 
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2.3 Communication and Learning Processes 

We now describe three different processes communication/learning that we consider. As we 
shall see, there will be some differences in how these are affected by homophily. 

2.3.1 Shortest-Path Communication 

A shortest-path communication process is any process where the time for communication 
to occur between two nodes (however "communnication" is defined) is proportional to the 
length of the shortest path between the two nodes§| In a connected network, this applies 
to broadcast processes, where nodes communicate to all neighboring nodes in each period, 
or to processes where the network is explicitly navigated by a traveler using some sort of 
addressing system. This applies to some social and many physical and electronic transmission 
processes. 



2.3.2 A Repeated Updating Learning Model 



The second proces s that we examine is based on a model first discusse d by F rench (119561 ) 
and Harary (119591 ). and articulated in a more general form by DeGroot 

Given a network A, let T(A) be defined by Tjj(A) = Aij/di. Beginning with some initial 
belief vector b(0) G [0, 1]", let 

h{t) =T(A)b(t-l) 

for all t > 1. That is, agents form today's beliefs by taking the average of neighbors' beliefs 
yesterday, where an agent can be his own neighbor. It is immediate that then 



h{t) = T(A)*b(0). 



If the initial beliefs b(0) are independent and identically distributed draws from normal 
distributions around a common mean then the linear updating rule at t = 1 corresponds 
to Bayesian updating w ith ce rtain priors about signal precisions as discussed by DeMarzo, 
Vayanos, and Zwiebel (120031 ). The behavioral aspect of the model concerns times after 
the first round of updating. Here, it is no longer Bayesian to update using a weighted- 
average rule, but due to the overwhelming complexity of the Bayesian calculation, we assume 
agents continue using the simple averaging rule in later period s, too . More discussion of this 
assumption can be found in DeMarzo, Vayanos, and Zwiebel (120031 ). 



Standard network definitions, such as shortest path, are omitted. See Jackson (|2008bl) for background 
definitions. 
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Beyond the interpretation of updating signals and "learning", the linear updating model 
can also be interpreted as a model that captures behaviors where agents adjust their behav- 
iors to match the average of their neighbors' choices. In particular, it can be interpreted as 
myopic best-response updating in a game. Suppose that agents have to choose each period 
a variable 6j, which captures their behaviors, for instance which dialect of a language they 
speak, and the dialects correspond to points in [0, 1]. The cost to i of communicating with 
j is {bi — bj)"^. If each agent communicates with his neighbors according to A, then the best 
response mapping is given by the linear updating rule. 

If T(A) is not connected, then it suffices to consider the asymptotic behavior of each 
connected component to understand the full dynamics of the process]^ Thus, we assume 
from now on that T(A) is connected, and define T(A)°° = limt^oo T(A)*. 

Lemma 1. If A is connected, thenT{AY converges to a limitT{A)°° such that (T(A)°°)jj = 

rf»(A) 
D(A)- 

Lemma U follow s from standard results on Markov chains (e.g., see Golub and Jackson 



( 120071 ) and Jackson (j2008bl ) for details and background) and implies that for any given initial 



vector of beliefs b(0), the limiting belief 

limb(t) = T°°b(0) = (6,6,...,6) where 6 = ^ M^^^. 

i ^ ' 

Thus, the relative infiuence that an agent has over the final beliefs is his or her relative 
degree. 

2.3.3 Random Walks 

The third process that we study is a random walk on a network. This is a process where 
a particle starts at some node and hops to any of its neighbors with equal probability at 
each step. One example to think of is that of a college student who is viewing Facebook 
profiles; at each step, she clicks on a random friend of the person whose profile she is currently 
viewing. Another example is the Google model of a surfer who randomly clicks on links as he 
navig ates the World Wide Web0 Here, the particle starts at some location and transitions 



^°If the communication network is directed then convergence requires some aperiodicity in the cycles of 
the network and works with a differe nt segmentation into components, but still holds quite generally, as 



discussed in Golub and Jackson (|2007D . 

"'^"'^Of course, we might think of him being biased toward following certain out-links from a given web page; 
this could be modeled by using a nonuniform random walk; i.e., one in which all nonzero entries in a given 
row of T arc not the same. We suspect that our conclusions would simply be modified by weighting factors, 
but the symmetry of the simpler case is handy in our proofs. 



9 



from node i to node j with probability Tij. The question is how long it takes to reach the 
steady state distribution on location. While this is a fairly specific process, and may not 
capture as many applications as the previous two processes, it has figured prominently in 
the literature Markov processes and random graphs and so it is a very useful benchmark. 

Just as in the case of the linear updating learning model. Lemma [1] implies that if A is 
connected, then T* converges and then the limit distribution of the random walk is to be at 
node i with probability regardless of the starting position of the random walk. So, the 
limiting distribution of the time that the walk spends at a given node is proportional to its 
degree. 

2.4 Consensus Time and Mixing Time 

We now present ways of measuring the speed at which the above-defined processes operate 
on a given network. The different processes suggest different measures. 

First, shortest-path based processes have an obvious measure of speed, which is simply 
the average shortest path length in the network, or if one is worried about the longest time 
it could take to pass from some node to some other node, then the diameter of the network. 
These are standard notions, so there is no need to develop any special measure for such 
processes. 

The other two processes require measures of timing/distance that are more tailored to 
them. We now discuss each in turn. 

2.4.1 Distances between Vectors 

There are two distance measures that we focus on in measuring convergence. 

The first is a standard weighted squared deviation distance. Given two vectors of beliefs 
V and u let 



In applying this, we will be interested in the differences between beliefs at time t and 
their limit: 



It will be useful to use weights, w, that are the influences of the agents s(A), where 




T(A)*b-T(A)°°b||2 
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The distance 



|T(A)*b-T(A)-b||^(^) 



examines the squared difference between agents' current behefs and their hmit behefs. The 
distance is weighted by the agents' degrees which gives more weight to relatively more in- 
ffuential agents. This quantity has a fairly simple interpretation. Consider the following 
experiment: agents start with beliefs b; at time t, an agent is sampled uniformly at random. 
We imagine that he asks a random neighbor for his opinion: i.e. one of his neighbors is sam- 
pled uniformly at random; we record the square of this neighbor's deviation from consensus 
beliefs. The expectation of this variable under this experiment is the distance defined above. 
In other words, sampling each agent in proportion to his degree captures the deviation from 
consensus of an opinion sent at time t across a randomly chosen link in the network. 

The following lemma shows an obvious relationship between a straight averaging of the 
squared deviations in beliefs and this weighted averaging. 

Lemma 2. Let e = (1, . . . , 1). Then 

/^||T(A)*b - T(A)°°b||^(^) < l|T(A)*b - T(A)°°b||^/„ 

Thus, for graphs where the highest- and lowest-degree agents have degrees not too dif- 
ferent from the average, as in some of the networks we will be concerned with, whether 
or not we weight mean square deviation from consensus by degree will not make a large 
difference. Even in networks where there are large deviations in degree, if the number of de- 
viant nodes is bounded, then a direct variation of Lemma [2] implies that the two notions are 
still close. Given that converting between these two notions only requires computing some 
bounds which will usually be good but which will depend on the application, we work with 
the degree-weighted version of the deviation measure as it has nicer mathematical properties 
and more intuitive relationships to the network structure. 

The other distance measure that we will work with is the total variation metric, where 
the distance between a vector v and another vector u is 



2 ^ 

i 

We will be applying this in cases where v and u are probability measures, so that v > 0, 
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u > 0, and J2i "^i = J2i — ^- Then it is straightforward to see that 



I TV 

|v — u| = - > \Vi — Ui\ = max 

i 



so the total variation metric keeps track of the maximal difference in the probability that 
two measures assign to some set. 

2.4.2 Consensus Time and the Linear Updating Model 

A central question in the linear updating model is the rate at which beliefs of the society 
converge to their consensus limit. We define the consensus time as follows. 

Definition 1. The consensus time to e > of the network A is 

CT{e; A) = sup min{t : ||T(A)*b - T(A)°°b||2 < 

be [0,1]" 

The need to consider different potential starting belief vectors b is clear, as if one starts 
with 6j(0) = bj{0) for all i and j then consensus is reached instantly. Thus, the "worst case" 
b will generally have beliefs that differ across types and is useful as a benchmark measure of 
how homophily matters; taking the supremum in this way is standard in defining convergence 



times (e.g., see Montenegro and Tetali (120061 ) ). 



Since T is a contraction under the distance measure (a standard fact about reversible 
Markov chains), once the mean-square deviation is below e, it can never go above it again. 
Thus, the definition is equivalent to letting CT{e; A) be the earliest time such that deviation 
from consensus is small forever after. 

2.4.3 Mixing Time and Random Walks 

There are many definitions measuring the distance of a random process t o its limit that 



come from the literature on Markov chains (e.g., see Montenegro and Tetali (120061 )). and the 
following definition is among the most common. Let e^ be the unit vector with an entry of 
1 in the i-th entry and O's elsewhere. 

Definition 2. The mixing time to e > of a network A is 

MT{e;A) = supmin{t : ||eiT(A)* - eiT(A)°°|f^ < e}. 
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Mixing time keeps track of how different the probabihty across states is after t periods 
compared to the hmiting distribution. Taking the supremum over different starting states is 
equivalent to considering all possible starting distributions. 

2.4.4 The Relation of Consensus and Mixing Time 

The consensus and mixing times have a close relationship intuitively, as both depend on 
how quickly T* approaches its limit. The main difference is that consensus time works 
with mean-squared deviations while mixing time works with the sum of absolute values of 
the deviations. In terms of the mathematics, the difference between them is simply the 
difference between the i"^ and £^ norms as well as a difference in the normalizing constant. 

Just as an illustration, consider the distance between (1, 0, 0, . . .) and {l/n, 1/n, 1/n, . . .). 
If we think of these as probability measures, then they are quite different as one is a Dirac 
measure and the other is a uniform distribution. In contrast, if these are behaviors or beliefs, 
then only one agent in the society is deviating substantially from the limiting behavior or 
beliefs. Thus depending on the application, one might or might not want to consider these 
to be close or far apart. Under the i"^ norm used in calculating consensus time, these are 
close to each other, while under the norm used in calculating mixing time, they are quite 
far apart. 

The mixing time approach is most natural and standard in the setting of Markov chains, 
where distributions are important, and conensus time is a natural measure in the setting of 
linear updating models. We present results on both, and shall see that despite the differences, 
they will behave similarly in our setting under the right normalizations. As we shall see also 
in the empirical section (see Figure [2]), consensus time and mixing time will essentially 
coincide in the high school data. 

2.4.5 Asymptotics 

In some cases, we consider what happens as n grows. This is natural as there are many 
properties of random graphs that can be deduced to hold almost surely for "large" networks, 
but that are hard to express in any meaningful way for a small random graph where any 
possible configuration has a nontrivial probability of arising. In such cases, then there is 
some question as to the appropriate choice of e. 

Given the steady state distribution s(A) = (j^^, • • • , '^d{a) ) ' c?i(A)'s are growing 

at roughly the same speed (a condition in some of the results below), then the entries of 
s(A) will be of order 1/n. As such, a natural benchmark is to examine CT{'~f/n'^] A) (given 
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the squaring in the norm) and MT(7/n; A) for some fixed 7 > 0. 



3 The Speed of Communication 

We now present our main conceptual conclusions about the speed of communication and 
learning for the various processes and discuss the contrasts across different sorts of commu- 
nication. We then come back to the main technical contributions in the next section, which 
seem to be of some independent interest. 

3.1 Shortest-Path Communication 

Consider the multi-type random network (n, P) with an associated number of nodes n and 
let (ifc(n, P) = J2k' ^kk'^k' indicate the expected degree of a node in group k, d{n,P) = 
^j^{dk{n,P)ynk/D{n,P) be the second order average degree in society^ 



be the total expected degree, and p{n, P) = D{n, P)/n be the average probability of a link. 
Suppose that 

(i) there exists M < 00 such that max^^fc' (ifc(n, P)/(ifc/(n, P) < M, 



(iii) log((i(n, P))/ log(n) 0, and 

(iv) there exists e > such that minkk' Pkk'/pi'n, P) > e. 

These conditions admit many cases of interest and can be understood as follows: (i) 
implies that there is not a divergence in the expected degree across groups; (ii) ensures that 
the average degree grows with n fast enough so that the network becomes connected with a 
probability going to 1, so that the network will not have isolated components across which 
communication is impossible; (iii) implies that average degree grows more slowly than n, as 
otherwise the shortest path degenerates to being of length 1 or 2 and does not match most 
empirical applications; and (iv) that there is some lower bound on the probability of a link 
between groups relative to the overall probability of links in the network. This last condition 

-^^Note that if the average degree dk{n, P) is the same across groups, then this is just the average degree. 




k 



(ii) d{n, P) > (1 + e) log(n) for some e > 0, 
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ensures that groups do not become so homophilous that the network becomes disconnected. 
These conditions still allow for substantial homophily. For instance, if is on the order of 
(i(n, P), then this still allows the probability of a link within same group to even become 
infinite relative to the probability across links, so that arbitrarily high levels of homophily 
are permitted. 

Theorem 1 (Jackson (2008a)). // the random network process (n, P) satisfies (i)-(iv) 
then, asymptotically almost surely in n, A(n, P) is connected; the average distance be- 
tween nodes is (1 + o(l)) log(n)/ log((i(n, P)); and the diameter of the largest component 
is e(log(n)/log(cr(n,P))). 

Theorem [1] tells us that although homophily can change the basic structure of a network, it 
does not affect the average shortest-path distance between nodes in the network. Moreover, 
we have a precise expression for that average distance which is the same as it is in an 
Erdos-Renyi random network with the same average degree. Effectively, as we increase the 
homophily, we increase the density of links within a group but decrease the number of links 
between groups. The result is perhaps somewhat surprising in showing that these two effects 
perfectly balance each other to keep average path length unchanged. The intuition behind 
the theorem can be understood in the following manner. Suppose that every node had a 
degree of d and that the network was a tree. Then the fc-step neighborhood of a node would 
capture roughly nodes. Setting this equal to n leads to a distance of k = log(n)/ log((i) 
to reach all nodes, and given the exponential expansion, this would also be the average 
distance. The theorem shows that this is exactly how the average distance behaves even 
when the network is not a tree, even when we noise up the network so that nodes do not 
all have the same degree, and even when we add substantial homophily to the network. In 
proving this, there are two critical parts: first, the randomness of the nodes' degrees does not 
substantially alter the calculation (even if a power law distribution is admitted in expected 
degrees); and second, even though homophily may alter the structure of the network, the 
shortest paths branching out from a given node are not much altered by homophily. The 
homophily affects which nodes are likely to be closer or further, but not the average distance. 

Corollary 1. Consider a process that has an expected communication time equal to the 
average distance between nodes. If it is run on two different random network formation 
sequences satisfying (i) to (iv) that have the same second order average degree as a function 
ofn, then the ratio of the expected communication times on the two different random network 
sequences goes to 1, asymptotically almost surely. 



^■^This result holds for a more general random network model, and is specialized to the multi-type random 
network model considered here for this statement. 
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The above results tell us that average distance is not affected by homophily, and diameter 
is affected only up to a fixed finite factor, provided there is some minimal level of inter- 
group connectivity. Thus average-path based communication processes are not affected by 
homophily but are affected by the link density in a society. 

3.2 Markovian Processes: Linear Updating and the Random Walk 

When we turn to the other forms of communication, there are substantial effects of ho- 
mophily. It is not simply average distance that matters, but the relative numbers of paths 
between different nodes that matters. 

To state the conceptual results most cleanly, we specialize to the islands model. The 
results extend to the more general multi-type random networks; those extensions require a 
somewhat longer exposition and so are discussed in the next section. 

In the context of the islands model, let us define two measures of homophily. The 
(unnormalized) homophily is defined as 



and captures how much more probable a link to a node of one's own type is compared to 
other types. This varies between and m, the number of islands. If a node only links 
to same-type nodes, then the average linking probability p becomes Ps/m and so H — m, 
while if a node only hnks to nodes of other types, then Ps — and so H — 0. We can also 
normalize the measure by dividing by the number of islands m; the normalized homophily 
is thus defined as 



Thus, h is the fraction of a node's links that are expected to be to agents of the same type. 

If we index a sequence of societies by their cardinalities n, then the following theorem 
summarizes the main conclusions of how homophily affects consensus and mixing times. The 
details and proof in a more general setting appear in the next section. 



away from 1, then for any 5 < 1, high enough n ensures that the following is true with 
arbitrarily high probability: 





Theorem 2. In the equal-sized islands model, if p{n)n/\o^{n) 



oo and h{n) is bounded 



(1 - 5) log(n) 
21og(i5l) 



<CT(7/n2;A(P,n))< 



(1 + 5) log(n) 
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and 



1 - \og{n) ^ ^^^^^^ . ,o ^ 1(1 + logH 



< MT(7/n; A(P,n)) < 



Theorem [2] provides us with a precise relationship between homophily and consensus 
and mixing times. As homophily increases, both consensus time and mixing time increase. 
In particular, both the consensus time CT(7/?t,^; A) and the mixing time MT(7/?t,; A) are 
proportional (up to a fixed factor) to 

log(n) 



This is true independently of link density and does not impose any specific requirements 
about how many types (islands) there are. If m becomes large, then this further simplifies 
and both the consensus time CT(7/n^; A) and the mixing time MT(7/n; A) are proportional 
(up to a fixed factor) to 

log(^) 
log(l//i)- 

When homophily is low, then marginal increases in homophily has only a small effect. But 
as homophily grows large [H/m or h closer to 1), the magnitude of the marginal effect of 
increased homophily becomes very large. 

3.3 Discussion 

The interesting and intuitive contrast is the comparison between what matters for shortest- 
path and Markovian communication processes. Comparing Theorems [T] and [21 we see that, 
in the context of communication based on shortest paths, homophily has no effect while 
average link density is critical. In contrast, when considering Markovian processes, we see 
that homophily is critical while average link density is irrelevant. 

This is an intuitive difference. With shortest-path communication, by definition, only the 
average distance between pairs of agents matters; it does not matter who is close and who is 
far from a given agent. Adding homophily changes network structure on the second dimen- 



sion - introducing a certain kind of clustering (see Jackson (l2008al ) for more on clustering) 
- but it does not change the average lengths of the shortest paths branching out from each 
node. Agents just end up very close to other agents of their own types, who then provide 
connections to other types. While these last statements are not obvious. Theorem [1] asserts 
that the various effects interact in just the way needed to keep average distances the same. 
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With Markovian communication, on the other hand, the time to convergence is deter- 
mined by how homophilous the network is, not by the density of its connections, provided 
that a certain (low) density threshold is met. A network in which the underlying links are 
formed without discrimination will not have any way to support long-term heterogeneity 
away from the steady state, even if the network is fairly sparse. On the other hand, a 
network with clearly defined islands of clustered agents will be able to maintain long-term 
differences in beliefs or behaviors. Each island will converge to its own metastable state and 
stay there for a long time, disagreeing with the other islands. Increasing the overall num- 
ber of links while maintaining the homophily will not speed up convergence. The ratio of 
same-type connections to different-type connections for a given node will remain essentially 
the same, and so the network will be able to support the same disagreement based on the 
self-reinforcing effect within given types. 

4 Relating Communication Times, Second Eigenval- 
ues, and Homophily 

Theorem [2] is proven via a series of mathematical results which are of some interest in 
their own right. Again, as outlined in the introduction, the relation of speed of learning 
to homophily is established by breaking things into two parts: how speed relates to second 
eigenvalues, and how second eigenvalues relate to homophily. The key result that unlocks 
the second part of this puzzle is a representative agent theorem that shows that the sec- 
ond eigenvalue of a random network is, asymptotically, only dependent on the underlying 
probabilities of linking between different types of agents. The extra noise of which specific 
agents are linked to which others is essentially irrelevant in a large network. Only the broad 
patterns of linking across different types are important. Once we have proved this, we can 
relatively easily deduce results relating second eigenvalues to homophily, and then complete 
the picture relating speed to homophily. 

The outline is summarized in Figure [TJ A specific roadmap of the technical results is as 
follows. 

Measures of convergence speed form the top layer of Figure [1] and summary statistics 
related to large-scale network structure form the bottom layer. The middle layer is the 
spectral intermediary that allows us to tie everything together. It is well-known that the 
second eigenvalue of a network's Markov matrix is a good proxy for the convergence speed 
of linear updating processes. Thus, in relating the top and middle layers of Figure [T], we 



18 



Time-Based Measures of Convergence 



Consensus 




Mixing 


Time 




Time 


V 


1 





Second Eigenval 




Linking 
Probabilities in 
Multi-Type 
Random Network 



Degree-Weighted 
Homopliily 



Traditional IVIeasures 
of Homoptiily 
Based on Bias 
Toward Own Type 



^leoren^^tw^^ges^ ' Lemma 5 

Network Structure Statistics 



Result is asymptotic 

Result is for the islands model 

Result is for the islands model and 
is asymptotic 



Figure 1: The paper's conceptual structure. A line between two quantities indicates that 
one is used to bound or characterize the other; quantities located lower in the diagram are 
used to characterize the ones that are higher. 
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use standard spectral results from the Markov chain literature to provide upper and lower 
bounds on mixing and consensus times. The results on mixing time, which is a central 
concept in Markov chain theory that captures how long it takes a process on a network to 
become random, are completely standard. We also define a notion of consensus time, which 
is essentially the time required for the mean squared deviation from consensus beliefs to get 
small. Bounding this quantity requires adapting standard results in a straightforward way 
(Lemma [3]) but it turns out that in the multi-type random graph setting, tighter bounds 
than usual can be obtained for it (Proposition [6]). 

The main novel technical work of the paper concerns the relationship between the middle 
and bottom layers of the figure. First, we prove a general result. Theorem [3l which shows 
that in large multi-type random network, the study of the second eigenvalue of the entire 
network can be reduced to a computation based on a representative agent matrix which 
contains only one agent for each type. Building on this, we relate the second eigenvalue to 
more concrete measures of homophily. One that turns out to be particularly well suited to 
the study of eigenvalues, and hence of convergence, is a new quantity called degree-weighted 
homophily (DWH). This quantity measures the relative advantage of same- type links over 
different-type links, but does so in a way that takes into account different degrees and group 
sizes. Proposition [1] shows that, in arbitrary networks, this quantity always provides a lower 
bound on second eigenvalue, and hence consensus time. These results already entail that 
homophily slows learning and provide general tools for studying the relationship in arbitrary 
multi-type networks. For more concrete characterizations in an important special case, we 
turn to networks in which agents split into equally sized "islands", where each island is a 
different type. Agents only discriminate based on whether someone else is inside or outside 
of their own islands. In this case, for large networks, we can exactly characterize second 
eigenvalues for large networks. These results can be stated in terms of a close relative of 
DWH (Theorem [2]) and in terms of more traditional unweighted measures of homophily 
(Corollary [3]). 

Putting the transitions between the layers together, we end up with a tight relationship 
for large island networks between homophily and the speed of learning, which then leads to 
the conclusions described above. These are summarized in Theorem [21 

4.1 Relating Consensus and Mixing Times to Second Eigenvalues 

We first state results that give fairly precise bounds on how the consensus time and mixing 
time of a matrix depends on its second eigenvalue. For any stochastic matrix T, let 1 = 
Ai(T), . . . , A„(T) be its eigenvalues sorted by magnitude in decreasing order. 
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4.1.1 Consensus Time 



Lemma 3. Assume A is connected, let A2(T(A)) be the second largest eigenvalue in mag- 
nitude of T{A), and let s be the (unique) steady-state distribution, with miiij Sj = s. If 
A2(T) 0, then for anyO<6<l: 

log(l/4e) - log(l/s) 
_ 21og(l/|A2(T)|) 

// A2(T) = 0, then for every < £ < 1 we have CT(£:; A) = 1. 

If e is fairly small, then the bounds in the lemma are close to each other and so we have 
a quite precise characterization in terms of the spectrum of the underlying social network. 

The proof of this result follows fairly standard techniques from the spectral literature 
and the proof appears in the appendix. 

The lower bound in Lemma [3] includes a term log(l/s) which can grow as n grows. We 
improve on the lower bound in Proposition [6] in the appendix, where we take advantage of 
the random graph structure to obtain a lower bound that does not depend on s or n. 



< CT{e;A) < 



log(l/g) 
21og(l/|A2(T)|) 



4.1.2 Mixing Time 



Next, let us consider mixing time. Again, we can derive bounds based on the second eigen- 
value. In this case, there is a difference that reflects the difference in the norms associated 
with these measures. 

The following lemma is adapted from Montenegro and Tetali (l2006l ) (see their Section 
2.4: "Does Reversibility Matter"). 



Lemma 4. [Montenegro and Tetali ^200q)] Assume A is connected. Let A2(T) be the second 
largest eigenvalue in magnitude ofT, and s be the (unique) steady-state distribution, with 
miuj Si = s. If A2(T) 7^ 0, then for any < e < 1.- 



log( 



< MTfe; A) < 



log(i) + log(l/g)/2 
log( 



.|A2(T)|^ ^"6V|A2(T)| 

// A2(T) = 0, then for every < e < 1 we have CT{£] A) = 1. 
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4.2 Relating Second Eigenvalues to Network Structure 



4.2.1 A Representative Agent Theorem 

We now present our main tectinical result, a "representative- agent" theorem that allows us to 
analyze the convergence of a multi-type random graph by studying a much smaller graph in 
which there is only one node for each type of agent. We show that under some conditions on 
the minimum expected degree, the second eigenvalue of most any realized multi-type random 
graph converges in probability to the second eigenvalue of this representative-agent matrix. 
This result is useful for dramatically simplifying computations of approximate consensus 
times and mixing times, both in theoretical results and in empirical settings, as now the 
random second eigenvalue can be accurately predicted knowing only the relative probabilities 
of connections across different types, as opposed to anything about the precise realization of 
the random network. 

Recalling the notation from Section [2.21 let dke(P,'n.) = n^Pki the expected number of 
links that a node of type k will have with nodes of type ^ and let (ifc(P,n) = ^^(ifc£(P,n) 
be the expected degree of a node of type k. Let Q(P, n) be a matrix of the same dimensions 
as P with entries 

_dkt{V,n) 
4(P,n) 

So, Qki is the expected relative fraction of links that a node of type k will have with nodes 
of type I. This simplifies things in two respects relative to the realized random network. 
First, it works with groups rather than individual nodes, and second, it works with expected 
fractions rather than realized values0 

Theorem 3. Consider a multi-type random network described by (P,n). For any 5 > 
there exists K such that z/ miufc (ifc(P, n) > K\o^n then 

|A2(T(A(P,n)))-A2(Q(P,n))| <5, (1) 

^"^There is one technical issue which does not substantiaUy affect any results but which deserves some 
comment. If one is thinking of a multi-type random network as describing the relationships relevant for a 
boundedly rational process of belief updating, then the self- weights assumed in our formulation of the model 
are unnatural. In particular, if we think of a link from i to j as capturing the fact that i has access to the 
belief oi j, then all nodes should have self-links; in the model as we have described it, a node of type k has 
a self link with probability only P^k ■ It turns out that formulating the model so that all nodes always have 
self-links does not change any of the results presented in this section, except for a small change in asymptotic 
rates of convergence in Theorem [3] and its consequences. This is because the spectral norm of the difference 
between the matrix we work with in the proof and the matrix with self- links decays to 0, assuming that 
minimum degree is unbounded. The same is true if we forbid self-links, which is natural in the dialect-choice 
game of Section [2.3.21 or the Facebook random walk of Section [2.3.31 In short, the results are not sensitive 
to how we model self links, provided that everyone treats all neighbors, including oneself, equally. 
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with probability at least 1 — 6. 

Theorem [3] is a law of large numbers for spectra of multi-type random graphs. Such 
techniques are a central tool in the random graphs literature; they show that various im- 
portant properties of random graphs converge to their expectations, which shows that these 
locally haphazard objects have very precise global structure. The closest antecedent to this 



particular theorem is by Chung, Lu and Vu (120041 ) for Markov or Laplacian matrices of net- 



works without homophily. This theorem is the first of its kind to apply to these matrices in 
a model that allows homophily and the associated heterogeneities in linking probabilities]^ 
We employ a similar strategy of proof, which relies on decomposing the random matrix rep- 
resenting our graph into two pieces: an "orderly" piece whose entries are given by linking 
probabilities between nodes of various types, and a noisy piece due to the randomness of 
the actual links. By bounding the spectral norm of the noise, we show that, asymptotically, 
the second eigenvalue of the orderly part is, with high probability, very close to the second 
eigenvalue of the random matrix of interest. Then we note that computing the second eigen- 
value of the orderly part requires dealing only with a representative-agent matrix, which will 
usually be small. 

We note also that P, and hence the result, is robust to a certain types of measurement 
limitation and/or error. The interaction matrix P can be estimated without requiring precise 
information about agents' actual degrees, but instead their relative proclivity to connect to 
different types of agents. For example, it would be enough to have a representative sample 
of each type's neighbors. This makes the model relevant in practical settings, since it is often 
very difficult to know what fraction of agents' friends are actually reported or observed. 

The usefulness of Theorem |3] becomes evident through a series of its implications. First, 
it can be used to tighten the lower bound in Lemma [3l so that second eigenvalues become 
an even better proxy for consensus time; this is done in Proposition [6] in the appendix. In 
the next section, we also use the theorem to derive expressions allowing us to understand 
how homophily affects consensus and mixing time in the islands model. Lastly, we use it to 
show that the degree weighted homophily bound in Proposition [1] is tight. 



^'''The result has a su perfic ial similarity to Theorem 10 in Mc Sher ry (|200l[ ). which is based on the work 
of Fiiredi and Komlos ( 198lh and Alon, Krivelevich, and Vu These results show that the spectra 



of certain random matrices are close to the spectra of their expectations. However, in those papers, the 
matrices of interest are adjacency matrices, whose entries are independent, and this independence is crucial 
to the arguments. Since the matrix of interest here is the updating matrix T(A), whose entries are not 
independent (because each r ow is normalized) , the same techniques do not go through. Instead, we build on 



work of Chung, Lu, and Vu (|2004l ) and on the other papers just cited to prove a theorem that has a similar 



flavor but in this different and more complex setting. 
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4.2.2 Non-Spectral Measures of Homophily 

The relationships between consensus and mixing time relate to the second eigenvalue of the 
updating matrix, and through our representative agent theorem, to the second eigenvalue of 
the matrix of probabilities of connection across types in the random networks model. These 
characterizations are still somewhat abstract as the second eigenvalue is an implicitly defined 
statistic that may be hard to grasp. In order to understand the implications of the structural 
feature of homophily on mixing and consensus time, we need to develop some understanding 
of how homophily affects the second eigenvalue. 

We begin with some general definitions of homophily, and later specialize to multi-type 
random networks. 



General Networks and Degree- Weighted Homophily Let us partition N into two 
subsets, M and M'^. First, we define a notion of the weight between two groups. 

Definition 3. Given T = T(A) and two subsets of nodes, B,C C N, let 

yyB,c — 



l^l|C| 

Wb,c keeps track of the relative weight between two sets of nodes B and C, and is a 
measure that ranges between and 1. The weight of an edge is proportional to the reciprocal 
of the product of the degrees of the nodes on its ends: when an edge is between two nodes 
that have many neighbors, it doesn't count for much, but when it is between two that have 
few neighbors, it counts for a lot. The weight also depends on group size: individual edges 
within larger groups matter less than those within smaller groups. With this definition in 
hand, we define a notion of degree- weighted homophily. 

Definition 4. Given any % C. M C. N , let the degree-weighted homophily of the network 
A relative to M he defined by 

DWH(M; A) = ^ — 1,1^ 1 , (2) 

|M|2 Z^ieM di{A) liVf^p Z^ieM'^ di{A) 

where the W's are relative to T(A). 

The term in the numerator keeps track of how much of the weight in T falls within M 
and within M^, and how much weight goes between these sets of nodes. So, links within the 
group M or its complement M"^ increase the degree weighted homophily and links between 
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the two groups decrease it. The term in the denominator is a normahzing value which 
guaranteei^ that this quantity is always between —1 and 1. 

To see that the degree-weighted homophily has an intuitive interpretation, consider a 
very simple special case. Suppose \M\ = n/2 and A corresponds to a regular graph, where 
all degrees are equal. Then 

DWHfM- A) = #(^^^^^^^'g^°^P edges) - #(between-group edges) 

^ ' ^ ~ # (total edges) ' ^ ' 

The theoretical justification for the usefulness of this measure is that it provides a lower 
bound on the magnitude of the second eigenvalue of T, and in the limit a tight bound. 
Let 

DWH(A) = max |DWH(M;A)|. 

(HCMCN 

Thus, the degree weighted homophily of a given network is the maximum level of degree 
homophily across different possible splits of the network!^ 

Proposition 1. Assume that A is connected. Then 

|A2(T(A))| > |DWH(A)| (4) 



Combining this with Lemmas [3] and H] we see that degree weighted homophily provides a 
lower bound on the consensus and mixing times. 

Corollary 2. Assume A is connected. Then for any < £ < 1, 

log(l/(4£))-log(l/s) 



and 



CT{e;A) > 



MT(£; A) > 



log(l/DWH(A)) 

log(l/(2e)) 
log(l/DWH(A))_ 



Compared with Lemmas [3] and HI the results have only one inequality each. This is 



^^This can be verified by using the expression of DWH as a quadratic form in the proof of Proposition [T] 
below and then noting that the spectral norm of the matrix T(A) is 1. 

^^This has intuitive relationships to a weighted version of a min cut, although this degree weighted ho- 
mophily measure turns out to be the right one for our purposes. 
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because DWH only provides a lower bound on eigenvalues, and hence on the convergence 
times. In the next subsection, we supply asymptotic upper bounds in the islands setting. 



The Islands Model and Simpler Measures of Homophily In order to develop the 
clearest and most intuitive relationships between homophily and the resulting consensus and 
mixing times, we now examine a specific case of the multi-type random network model. 
Recall the islands model from Section 12.2.11 and consider the case where there are m > 2 
equally-sized groups. In all the results of this section, we will consider only n divisible by m, 
and the the results will concern limits as n ^ oo. All quantities (probabilities, homophilies, 
etc.) are implicitly indexed by n, but we suppress this indexing unless it is important to 
emphasize it. Let Ps and pd be the probability of links within and across types, respectively, 
and p be the overall probability of links. 

Let EDWH(m,ps,prf) denote the expected degree weighted homophily in the islands 
model where we calculate this relative to the expected number of links within and across 
islands. That is, if we have a collection of k islands, M, let 

[Psk + pdk{k-l)]/<P 



A;2 
and 

J7ur Pdk{m-k)/(P 2 

EWmm'^ = — _ — =Pd/d , 

where d = pn is the expected degree, and EWmm and EWmw are the expected versions of 
Wmm and Wmw- Then, we have an expected variation of degree weighted homophily: 

, , EWm,m + EWw^M" ~ '^EWm.m'^ 
EDWH(M;m,p„prf) = p^— ^ ' ^ 

pup Z^tgA/ d + jlFp l^i&M'^ d 

Let I{n) denote the subsets of nodes that are collections of islands, so that if M G I{n) then 
any node in M is of a different type from any node in M'^. Let 

EDWH(m,ps,Pd) = max EDWH(M; m,ps,pd). 

MG/(n) 

This is not quite the expected degree weighted homophily, since we are working with expec- 
tations in the numerator and denominator. 

We can now prove the following theorem which lets us deduce how consensus and mixing 
time depend on degree weighted homophily in the islands (see Corollary Hj) , since it estab- 
lishes how the second eigenvalue relates to homophily and we have already established how 
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consensus and mixing time relate to the second eigenvalue. 

Theorem 4. In the equal-sized islands model, if p{n)n/ log^ (n) oo then 

|A2(T(A(P,n)))-EDWH(m,p„prf)| i^O. (5) 

Theorem |4] provides us with limiting expressions for the second eigenvalue as a function 
of homophily. 
Let 

H = P^ 
P 

capture how much more probable a link to own type is compared to other types, and 

mp 

be the relative fraction of links to own type. If we index a sequence of societies by their 
cardinalities n, then the following lemma establishes the relation between degree weighted 
homophily and the relative probabilities of links and the number of islands. 

Lemma 5. In the islands model with m > 2 equal-sized groups and probabilities of links 
within and across types ps and pd, respectively, the degree weighted homophily is 

EDWH m,p„pd = — -— = -. 

Ps + [m — l)pd m — 1 

Moreover, 

EDWH(m,p„pd) = EDWH(M;m,p„pd) 

for all M G I{n), so the grouping of the islands is irrelevant in calculating the homophily. If 
the number of islands m{n) diverges then \ EDWB.{m,ps,Pd) — h\ ^ 0. 

From Theorem H] and Lemma O the following corollary, characterizing second eigenvalues 
in terms of traditional measures of homophily in the islands model, follows immediately. 

Corollary 3. In the islands model with m>2 equal-sized groups, if p(n)n / \og^ (n) — * oo 
and probabilities of links within and across types Ps and pd, respectively. 



A2(T(A(P,n))) ^ ^ 



m 



p 



0. (6) 



If the number of islands diverges, then | A2(T(A(P, n))) — h\ 0. 



27 



Combining Lemmas [3] and H] with Corollary [3] we have the following summary of the 
asymptotic behavior of consensus and mixing time in the islands model, which is Theorem 

El 

Corollary 4. In the equal-sized islands model, if p{n)n / log^ (n) oo and h{n) is hounded 
away from 1, then for any 5 <1, with a probability going to 1: 

and 

(1 - 5) logH ^ ^^^^^ , _ ^ 1(1 + 5) logH 



< MT(7/n; A(P,n)) < 



Mrf) - ' ^ ' log(f5f) ■ 

The condition that h is bounded away from 1 rules out the case that all but a vanishing 
fraction of links are within islands. If that is the case, then the islands can become dis- 
connected with a nontrivial probability and the mixing and consensus times diverge. The 
consideration of setting e = Xjv? for consensus time and e = Xjn for mixing time ensure 
that the convergence is within the order of magnitude of the weight on any given node. We 
can rewrite these expressions as 

^^-')'-^^^) <CT(,/n^;A(P,n))< + 



2|log(EDWH(m,p„p,))| - ^ ,/ , v , " | log(EDWH(m,p„prf))| 
and 

(l-5)log(n) , wx. ^^ f(l + 5)logH 

V ; 6V ; < MT(7/n; A(P,n)) < j t>K i 



|log(EDWH(m,p„p,))| - ^" ' ^ ' - |log(EDWH(m,p„p,))r 

Corollary m provides us with a fairly precise relationship between homophily and consen- 
sus and mixing times, as further discussed in previous sections. 

The above results presume equal sized islands. We also provide a result for unequal 
sizes for the case of two islands. This shows that the degree weighted homophily bound in 
Proposition [H is tight. 

Theorem 5. Suppose n = Hfin], [(1 - fi)n\) with < /i < 1 and 

Ps Pd 
Pd Ps 
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where all the entries of this matrix are positive. Then 

plim,^^A2(T(A(P,n))) = f , ^ = plim^^oo DWH(iVi; A(P, n 

where Ni denotes the first [/in] nodes. 

5 Consensus and Mixing Times in the 
Adolescent Health Data 

In this section, we examine consensus time and mixing times in 84 social networks from the 
Adolescent Health dataset0 and show how the patterns in that data illustrate our conclu- 
sions. For each of the 84 schools, the dataset includes information on each student's grade, 
gender and race. In addition, each student was asked to name his or her closest male and 
female friends]^ Using the reported friendship networks (linking two individuals if either 
named the other as friend) we compute consensus and mixing times. We can also examine 
traditional and degree weighted homophily measures based on the observed characteristics: 
grade, race and sex. Grade is the year in school, and ranges from 6 to 12, as most of the 
schools include 6 different years of students. Race is self-reported as Asian, black, Hispanic, 
white, or other (and these were the only categories permitted). Sex is self-reported as male 
or female. 

It is worth commenting briefly on the nature of the illustration that our empirical compu- 
tations provide for the theoretical work. There are actually three aspects to the full spectrum 
of our results that can be tested. 

• First, there is a question of whether substantial information about the impact of ho- 
mophily can be captured by examining relatively simple definitions of types. 

• Second, many of our results are asymptotic and there is a question about whether our 



^^Add Health is a program project designed by J. Richard Udry, Peter S. Bearman, and Kathleen MuUan 
Harris, and funded by a grant P01-HD31921 from the National Institute of Child Health and Human De- 
velopment, with cooperative funding from 17 other agencies. Persons interested in obtaining data files from 
Add Health should contact Add Health, Carolina Population Center, 123 W. Franklin Street, Chapel Hill, 
NC 27516-2524 (addhealth@unc.edu). We thank James Moody for making available the data organized in 
Pajek files for the 84 schools. 

^^The number of friends reported was capped at five of each type, or ten in total. Less than ten percent of 
the students hit the caps, but that still censors the data. This design feature makes homophilies computed 
based on gender somewhat less reliable than the others, since it would tend to equalize the numbers of 
reported male and female friends, even if there were strong homophily present. 



29 



bounds on how consensus time and mixing time relate to homophily will be useful in 
finite samples of medium size. 



• Third, there is a question of whether or not people actually communicate in ways 
that are captured by the learning model and random Markov models that underlie 
consensus and mixing time. 

Our empirical analysis answers the first two questions in the affirmative. In particular, 
the multi-type random network model is a good fit for these social networks when it comes 
to investigating consensus and mixing times, and gets a good deal of explanatory power 
from very basic definitions of types. Our main claim — that the study of the convergence 
of Markovian processes on large networks can be reduced to simple computations about 
homophily — is not merely an asymptotic, theoretical claim, but one that holds up well 
when applied to the data. To see this more clearly, consider some of the typical building 
blocks that went into establishing the relationship between convergence and homophily: 
Lemma [3l Theorem [3l and Corollary HJ for example. Each result either provides inequalities 
or statements about asymptotic convergence. A priori, the data might be badly behaved 
with respect to either of these. It might stay within the inequalities in a noisy way (oscillating 
randomly between the bounds). It might also take very large networks for the asymptotic 
results to kick in. Lastly, it might even be the case that the multi-type random graph model 
captures none of the salient structure of these social networks. If any of these happened, then 
there might be very weak, nonexistent, or "wrong way" correlations in the quantities that 
we studied empirically in this section. The fact that the correlations are quite strong and 
correspond to our predictions shows that the relationships suggested by the inequalities and 
asymptotic theory are relevant. In particular, these high school friendship networks seem to 
have many of the salient features of well-behaved multi-type random graphs, and much can 
be captured with simple definitions of types. 

Whether or not these models of updating and communication shed light on actual social 
behavior - that is, on how people actually communicate in or navigate networks - is obviously 
an important question, but one that requires additional (longitudinal) data and is left for 
future investigations. 

The first step is to test the relationship summarized in Theorem |2] between convergence 
and homophily in the equal-sized islands model. We begin the building blocks of the theorem 
to rearrange it into an inverted form that compensates for the extreme behavior of the con- 
vergence times at high homophilies and makes the quantities amenable to linear regressions. 
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In particular, we define 

/ log(n)\ 
p{X) = exp ( ^ 1 . 

By Lemmas [3] and m p(CT(7/r?,^; A)) and p(MT(7/n;A)) are approximately the second 
eigenvalue of T(A), which, by Corollary [31 is well approximated by ^5t- For instance, if X 
is an empirical measurement of a consensus time for some choice of e = then p{X) 

can be thought of as an imputed per-step rate of convergence. Thus, we run regressions of 
p(CT(7/n2; A)) and p(MT(7/n; A)) on 

The regressions include an intercept term. This is not in Theorem [2] or its constituent 
parts. However, it turns out that if there is additional homophily within each island on 
dimensions not reported in the data, then there will be an intercept term in the model. 
Details are in Section 17.3. 1[ 

Lastly, we removed two data points whose consensus and mixing times exceeded our 
algorithms' capacity. These networks (schools number 53 and 57) had very large consensus 
times (on the order of serveral thousand), so computing them precisely was infeasible. These 
would not substantially change the results of the first two regressions, since for those pur- 
poses, a consensus or mixing time of about 1000 is essentially infinite. So, from now on, we 
work with the 82 data points excluding those schools. The results are presented in Table [2] 
and Figure [31 

Results for mixing time are very similar to those for consensus time, and so we collect 
the analogous results for mixing time in the appendix. The fact that consensus time and 
mixing time show similar results is not surprising, given that they both involve measures 
of distance between T(A) and its limit. To see this directly, we note the tight relationship 
between the two in Figure [2l 

We begin with the finest definition of type available in the data. Thus, we consider a 
"type" to be a specific combination of race, grade, and sex: so for instance a type would 
be all female Asians in grade 9. Thus, in a high school with two sexes, four races, and four 
grades there are thirty two types. 

The in the above regression shows that the homophily among these types accounts 
for roughly a quarter of the variation in consensus and mixing times in the data. This is 
reasonably high in view of the fact that many qualities that determine network formation - 
such as interests, extracurricular activities, etc. - are not captured by these data. 

We also explore how much of the variation can be explained by the even simpler definitions 
of types. For example, for grades, out of the three characteristics captured in the data, has 
the greatest variation in homophily. The grades also have approximately equal sizes in most 
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Figure 2: The relationship between consensus and mixing times. 



Table 2: Dependent variable = p{CT{0.1/n'^; A)) 

{N = 82) 

Variable CoefRcient 
(t-statistic) 
Intercept 0.870 

(61.2) 

^ for "type" 0.297 

(4.91) 

0.231 
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Figure 3: The relationship between p(CT(0.1/?7,^; A)) and computed based on the finest- 
grained type data available (i.e. a type is a race-grade-sex tuple). 
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of the schools, so that it is legitimate to use the formulas from the equal-sized islands model. 
The results are then reported in Table [3] and Figure HI 

Table 3: Dependent variable = p(CT(0.1/?7,^; A)) 
{N = 82) 



Variable 


CoefRcient 




(t-statistic) 


Intercept 


0.809 




(32.6) 


for grade 

m— 1 ° 


0.209 


(5.21) 


R2 


0.253 



The fit is similar in quality to that obtained from the finest definitions of type. 

We can also examine some other implications of the theory. By focusing on Theorem H] 
instead of Theorem [2] we can replace the above regressions by DWH. The increase in 

explanatory power comes from the fact that DWH does not presume equal-sized groups, and 
thus captures the fact that different islands may be of different sizes. We compute DWH 
based on the three different observed characteristics in the data and use them to estimate 
p(CT(7/?T,^; A)) and p{MT{'~f /n; A)). For a given definition of type, we take the DWH over 
all nontrivial partitions that never separate two agents of the same type. For example, 
DWH grade (A) is the DWH taken over all partitions which have some grades on one side and 
the rest of the grades on the other. In the tables below, we refer to this quantity as Grade 
DWH, and similarly with other type classifications. More generally, given some definition 6 
of type such that : N ^ C maps agents to types, we define 

DWHe(A) := max DWH(M;A). 

e-i(Af)n6i-i(Af':)=0 

CM CAT 

For this analysis, we do not compute DWH for types, as with thirty or so different groups, 
the number of different partitions is such that the computations become infeasible. 

Here we are measuring realized degree weighted homophily, DWH, rather than its "ex- 
pected" analogue, EDWH. This is because EDWH is not available to us, so, as usual, we 
replace it by the sample analogue. As shown by the following lemma, this is valid asymp- 
totically. 
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Figure 4: The relationship between p(CT(0.1/r2^; A)) and for grade. 
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Lemma 6. Consider the islands model with m{n) > 2 equal-sized groups where m{n)/n 0, 
and probabilities of links within and across types Ps and pd, respectively, and consider any 
sequence of groupings of islands M{n) G I{n). Then 

|EDWH(M(n),m,p„prf) -DWH(M(n),A(P,n))| ^ 0, 

and so 

|EDWH(m,p„pd) -DWH(M(n),A(P,n))| ^0. 

Regressions of convergence rates on the DWH for the 82 networks are reported in Table HI 
Here, we run the regressions with an intercept term, which is motivated by the same idea as 
the one formally justifying the inclusion of an intercept in the first regressions of this section. 
We have not worked out the details formally. Moreover, one of our regressions includes three 
DWH explanatory variables - one for each dimension. Such an additively separable form is 
not justified by the theory but seems to track the data quite closely, so we include it here to 
point out a potential relationship which may be fruitful to examine further. 



Table 4: Dependent variable = p(CT(0.1/n^; A)) 
{N = 82) 



Variable 




CoefRcient 
(t-statistic) 








All Homophilies 


Grade Only Gen 


>der Only 


Race Only 


Intercept 


0.644 


0.716 


0.886 


0.916 




(20.7) 


(21.52) 


(56.4) 


(88.1) 


Grade DWH 


0.347 


0.330 








(7.75) 


(6.64) 






Gender DWH 


0.137 




0.231 






(2.64) 




(3.36) 




Race DWH 


0.105 






.0663 




(5.04) 






(2.29) 


R2 


0.545 


0.356 


0.123 


0.0617 



Consider Table HI The first regression includes all the homophilies acting as independent 
variables, and the others have each type of homophily used as an explanatory variable on its 
own. The table shows two main things. First, all three homophilies are significant at the 2% 
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level when the regression is run with all three explanatory variables. Second, grade homophily 
is doing most of the work in explaining the variation in convergence times; other kinds of 
homophily have a significant effect, but 36% of the variation can be explained by ignoring all 
but the grade information. This is illustrated in Figure [5], where we plot p(CT(0.1/n^; A)) 
versus grade homophily and draw the least-squares trend line corresponding to the second 
column of Table HI 




Figure 5: Rate of convergence of consensus time for the 82 friendship networks plotted 
against the degree weighted homophily in each of the networks calculated relative to grade. 



While the above analysis has examined how imputed rates of convergence are affected by 
homophily, we could work with the consensus times and mixing times directly and compare 
them to the prediction of Theorem [2], which states that they should be approximately pro- 
portional to \og{n)/ log(^5Y)- When we perform such an analysis, we find results consistent 
with the theory, as pictured in Figure [9], where the slope coefficient is significant at levels 
well below .001 and the intercept is constrained to be 0. However, the B? in this regression 
is low (0.06) because some extreme data points contribute very large error under this pa- 
rameterization. It was for this reason that we changed the axes in the above regressions; the 
rescaling makes errors comparable across data points. 

Lastly, we test the prediction of Theorem [T] by computing the average shortest path 
length in each network and running a regression of this on \og{n)/ log((i(A)), which is what 
the theorem predicts the quantity will depend on, as well as on — \ogn/ log (^Er) (computed 
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Figure 6: The relationship between consensus time and the prediction of Theorem [2] for 
grade. 
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based on full-type homophilies) which is supposed to predict consensus and mixing times 
but not shortest paths. The results are presented in Table [51 



Table 5: Dependent variable = average shortest path length 
{N = 82) 



Variable 


CoefRcient 
(t-statistic) 






Density and Homophily 


Density Only 


Intercept 


-0.125 
(-1.08) 


-0.106 

(-.846) 


log(n)/log(rf(A)) 


1.27 
(33.2) 


1.32 

(32.9) 


log n/ log for type 


0.00981 
(3.79) 




R2 


0.942 


0.931 



The coefficient on the homophily regressor, while significant at conventional levels, has a 
much smaller t-statistic than the coefficient on log(?T,)/ log((i(A)). Also, since the coefficient 
on the homophily regressor is about 0.01 and the values of — logn/ log (^rj) are between 1.58 
and 66.7 in the data, the overall predictive power of the homophily term is small. This is also 
seen in the difference between the two regressions in Table |5l where dropping the homophily 
dependent variable results in only a one percent change in the R^. This shows, as predicted, 
that network density matters much more for shortest path lengths than homophily does. 
The relationship between shortest path and the predicted explantory variable log(n)/ log((i) 
is pictured in Figure [71 

6 Concluding Remarks 

Our results are built in several parts: 

(i) we relate communication processes to second eigenvalues largely building on standard 
spectral theory, 

(ii) we provide novel results relating second eigenvalues to homophily, 

(iii) we provide novel results relating homophily to random graphs, and 
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Average Shortest Path vs Log(n)/Log(d) 


6 - 




5 - 




4 - 


J?^ 




^^■P^ ^ Average Shortest Path vs 


3 - 


Log(n)/Log(d) 


2 


Linear (Average Shortest Path vs 

Log(n)/Log(d)} 


1 - 




n 




1 2 3 4 5 



Figure 7: The relationship between average shortest path length and the prediction of The- 
orem [TJ 
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(iv) finally, combining these results enables us to relate communication to homophily in 
random networks. 

Our results show that homophily can substantially affect communication processes but 
that this depends both on the level and type of homophily and the type of communication. 
Communication based on shortest paths is essentially unaffected by homophily, while random 
walks and updating by averaging are affected in a well-identified and nonlinear manner. The 
underlying reason is that homophily does not change shortest paths, but affects the relative 
numbers of paths between nodes of different types. Interestingly, there is a complete reversal 
in the manner in which communication depends on the structure of connections: 

• The speed of shortest path communication depends on link density but not homophily. 

• The speed of Markovian processes (weighted averaging and random walks) depends on 
homophily but not link density. 

The methods used to arrive at these conclusions may be of some independent interest 
for empirical work. In particular, we have shown that second eigenvalues and convergence 
times of a stochastic matrix arising from a large multi-type random network can be predicted 
very accurately from a much smaller matrix that only records relative linking probabilities 
between types. Thus, instead of attempting to obtain reliable data on an entire large network, 
which is difficult if not impossible, one is justified in using random sampling to estimate the 
matrix of probabilities. This approach naturally raises the question of what other global 
properties of large networks can be estimated accurately using convenient projections of the 
data which avoid collecting too much local information; this is a potential avenue of further 
research. 

We have also examined a set of 82 networks to see how the communication processes 
would operate on these networks, and how that relates to the observed homophily of the 
networks. The results show significant relationships that are as predicted by the theory, 
with increased homophily leading to increased consensus and mixing times according to the 
predicted formulas. 

Our results suggest the importance of understanding homophily in order to understand 
communication and the functioning of a society. This is, of course, a first step and suggests 
many avenues for further research, of which we mention only the most obvious ones. Consid- 
ering other sorts of communication, learning, diffusion, and interaction and examining other 
data will give a fuller understanding of homophily's role. For example, an interesting area 
to explore and to compare results with would be coordination and other games on networks. 
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where i t has been fou nd tha t network structu re can affect both the strategic choices (e.g., 
Morri s (120001 ) . Young (Il998l ). Jack son (2008bl )) and the speed of convergence (e.g., EUison 
(bood) and Montanari and Saberi (120081 )). 
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7 Appendix: Proofs 

7.1 Background on Reversible Markov Chains 

For completeness and to fix notation, we review very well-known results about Markov 
chains and self-adjoint matrices which form the foundation for our measures of convergence 
to consensus and bounds on the time required to converge. None of the material in this 
section is original; fur ther b ackground and references on these techniques can be found in 
Diaconis and Stroock (jl99ll ). 

Symmetry or self-adjointness is often a useful property to have when working with eigen- 
values and other spectral quantities of a matrix. While T(A) generally will not be symmetric, 
we can make it into a self-adjoint operator under a well-chosen inner product, as in Diaconis 



and Stroock (1199 ll ). For this we need a few definitions. 



Given a probability distribution tt on N, define 

(v,w)^ = 



ViWiTTi. 



This is just the Euclidean inner product weighted by the entries of the distribution. 

Definition 5. A stochastic matrix T satisfies detailed balance (equivalently, is reversible j 
relative to the distribution tt over the nodes if, for every i,jEN we have 



^{Tij 7TjTj{. 
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Proposition 2. //T satisfies detailed balance relative to tt then n is a stationary distribu- 
tion for T. 

Proof of Proposition [2) Observe that 

i i J 

where the first equahty uses the definition of detailed balance. I 

Proposition 3. The stochastic matrix T satisfies detailed balance relative to tt if and only 
if T is self-adjoint under the inner product (-, ■)^. 

Proof of Proposition [3l Assume that detailed balance is satisfied. Take v = <5j and 
w = Sj for some i,j G A^, i.e. take two standard basis vectors. Then 

(Tv, W)^ = TjiTTj 

and 

(v,Tw)^ = TijTii. 

These two quantities are equal by detailed balance and this equality extends to arbitrary v 
and w because the inner product is a bilinear form. 

For the converse direction, the equality {Tdi, dj)^,- = {dt, Tdj)^^ for standard basis vectors 
is guaranteed as a consequence of T being self-adjoint, and this immediately gives detailed 
balance by the simple calculation above. I 

The next claim is that T(A) satisfies detailed balance relative to s(A), which is defined 

by 

This is immediate to check from the definitions. Thus, s(A) is the stationary distribution of 
T(A) and, moreover, T(A) is self-adjoint relative to (■, ■)s(a)- As a result, the eigenvalues of 
T(A) are all real. Let 1 = Ai(T(A)), . . . , A„(T(A)) denote these eigenvalues ordered from 
greatest to least by magnitude, and 

1 = /3i(T(A)) > /32(T(A)) > (3s{T{A)) >■■■> /3.(T(A)) > -1 

denote these same eigenvalues ordered from greatest to least as real numbers. Obviously 

|A2(T(A))| = max{|/32(T(A))|, |/?„(T(A))|}. 
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Now, since T(A) is self- adjoint, we can use the powerful Courant-Fischer variational 



characterization of the eigenvalues of a Hermitian matrijo. Let e denote the unit column 
vector of ones. 



Proposition 4. 



(3n{T{A))= inf 



/?2(T(A)) = sup 

Ot^vGM" s.t. 
(v,e>=0 



(v,T(A)v) 
(v,v) 

(v,T(A)v) 
(v,v) 



(7) 
(8) 



where the inner product everywhere is (■, ■)s(a)- 

This says that the smallest eigenvalue (under the real number ordering) minimizes the 
normalized quadratic form in braces, where v ranges over all nonzero vectors. Moreover, the 
second largest eigenvalue (under the real number ordering) maximizes the same quantity but 
where v ranges over all nonzero vectors orthogonal to the right eigenvector corresponding to 
the largest eigenvalue. 

7.2 Proofs and Additional Material for the Main Results 
7.2.1 Relating Consensus Times to Second Eigenvalues 

Proof of Lemma [S] In the proof, we fix A and drop it as an argument; we also drop the 
argument on the eigenvalues, being fixed throughout. 
We first show that 

CT(e) < 



\og{l/e) 



21og(l/|A2|) 

Take any b G [0, 1]". Let Uj be the projection onto the eigenspace of T corresponding to Aj. 
Note that under (-, ■)s, these eigenspaces are orthogonal. Define U = Y17=2'^i- '^^^^ 
projection off the eigenspace corresponding to A = 1. Then: 



T°°)b| 



i=2 



spectral theorem applied to the stochastic matrix 
= |Ajp*||Uib||g orthogonality of the spectral projections 

i=2 

n 

<\\rY.\\^M\i 



i=2 



^Horn and Johnson (| 19851 . p. 176-178) 
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i=2 



= iA2riub||2 
< iA2r*iibii^ 



< 



= lA. 



|2t 



orthogonality of the spectral projections 

definition of U 
projections are contractions 

b e [0, 1]" and definition of (•, •)s 
entries of s sum to 1. 



Thus, if 



then 



t > 



log(l/g) 
21og(l/|A2|) 



(T*-T°°)b||^ <s, 



from which the bound follows upon observing that CT(£) must be an integer. This also 
shows that when the second eigenvalue is identically 0, then consensus time must be 1. 
Now we show that 

.21og(l/|A2|)J - 

Let w be an eigenvector of T corresponding to A2, scaled so that ||w||g = s/4. Then the 
maximum entry of w is at most 1/2 and the minimum entry is at least —1/2. Consequently, 
if we define b = w + e/2, then b G [0, 1]". Now, using the fact that e is a right eigenvector 
corresponding to Ai = 1 and spectral projections are orthogonal, it follows that: 

||(T*-T°°)b||2 = |A2|'iU2w||,' 
= |A2p*||w||^ 
= 'lA.P*. 



Therefore, if 



then 



t < 



log(s/4£) 
21og(l/|A2| 



||(T* - T~)b||,^ > £, 

from which the remaining bound follows upon observing that CT{e) must be an integer. | 
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7.2.2 The Representative Agent Theorem and a Consequence 

Theorem [3] and Proposition [6] require related machinery, which we will develop and apply in 
this section. First, we introduce some notation. 

We drop the arguments on the random variable A. Let D(A) denote the diagonal matrix 
whose {i, i) entry is di{A). Let R be the n-hj-n matrix given by Rij = P^e if ^ € N/^, j G A^^. 
Then the expected degree of node i is Wi := Rij. 

Let V = ^jWj be the sum of expected degrees and v = sum of realized 

degrees. 

For any matrix T, let ||T|| = sup||y||^;^(v, Tv), where the inner product is the standard 
Euclidean dot product. Let 

J = D(A)-^/'AD(A)-^/' - t;-^D(A)^/2Ej3^A)i/2 

and 

K = D(R)-^/2j^D(R)-^/2 _ \/-1d(R)^/2ed(R)V2_ 
Now we note a fact from basic linear algebra. 

Fact 1. D(A)"^/^AD(A)"^/^ and T(A) = D(A)"^A are similar matrices, so that they 
have the same eigenvalues, and that f ^^D(A)"'^/^ED(A)"'^/^ is the summand of the spectral 
decomposition o/D(A)^^/^AD(A)^^/^ corresponding to the eigenvalue 1. The same reason- 
ing applies when we replace A by H and v by V. 

We now state the proof of Theorem [3l or rather a reduction to a proposition which will 
also be useful for proving Proposition [61 

Proof of Theorem [3l It is clear that D(R)^^R has the same eigenvalues as Q, so to prove 
the claim it suffices to prove that the former matrix has second eigenvalue close enough to 
that of D(A)-i/2AD(A)-i/2_ 

By Factdl we know that ||J|| is the second largest eigenvalue in magnitude of the matrix 
D(A)^^/^AD(A)^^/^, and ||K|| is the second largest eigenvalue in magnitude of the matrix 
D(R)-i/2RD(R)-i/2_ Thus, by the triangle inequality, if we can show that with probability 
at least 1 — 5 we have || J — K|| < 6, then the proof is done. This is the content of Proposition 
|5] below. I 



Now we state a lemma from the proof of Theorem 3.6 of Chung Lu and Vu (120041 ). which 
is a consequence of a Chernoff-type concentration inequality and is quite useful throughout 
this section. 
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Lemma 7. Fix any 6 > 0. //wmin/logn is high enough, the following statement holds with 
probability at least 1 — 6 for all i simultaneously: \di — Wi\ < 6wi. 

Proposition 5. //wmin/log^n is high enough, then with probability at least 1 — 5 we have 
IIJ-KII < 6. 



Proof of Proposition O Write 



J-K = B + C + L + M where B, 



A. 



^ \/ didj 



4 — R 



_ ( ,/d~d~ 

ij — 77 I — 

V y y/w~w'j 

Mi^ = {V-^-v-^)^ddr 

By the triangle inequality, 

||J-K|| < ||B|| + ||C|| + ||L|| + ||M||, 

so it suffices to bound the pieces individually. 

Now we list two lemmas, useful only in this proof, from Chung Lu and Vu (120041 ) . The 
proof of the ffist requires only minor modification in our setting. 

Lemma 8. Fix any 5 > 0. Then if w^^^j log^ n is high enough, with probability at least \ — 

2 ^ logn 



Proof of Lemma [8l The only step of the proof of this last lemr na tha t does not work exactly 
as in the proofs of Theorems 3.2 and 3.6 of Chung Lu and Vu (l2004f l is their equation (3.2). 
This step asserts (in our notation) that for m > 2, we have 



'1 — Rij)Rij + 



-i?,,)™(l 



Ri 



< 



Ri 



[w. 



;Wj)™/2 



< 



WiWj/V 



< 



1/V 



\m-2 



The step which is slightly different is the penultimate inequality. We must show that R^^ < 
WiWj/V. But note that Wj/V < 1 by definition and Wi = J2k > Rij- I 
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It follows that we have ||C|| < 6/ A with probability at least 1 — 5/4. 

Lemma 9. Fix any 5 > 0. If w^\^/ n is high enough, the following statement holds with 
probability at least 1 — 5; 

IIMII < ^ 



w 



It follows that we have ||M|| < 5/4 with probability at least 1 — 5/4. 
To bound ||B|| and ||L||, we will use Lemma [7] and two simple facts about the matrix 
norm. Let abs(X) denote the matrix whose (i, j) entry is \Xij\. 

Lemma 10. 1. For any matrix^, ||X|| < || abs(X)||. 

2. Suppose there are two nonnegative matrices, X and Y and a constant c > such that 
for each i,j, we have Yij < cXij. Then ||Y|| < c||X||. 

Proof of Lemma llOt For (1), note that for all ||v|| = 1, we have 



(v,Xv) = ^ViVjXij 

hi 

id 

< ||abs(X)||, 

the last inequality being true because || abs(v)|| = 1. This proves the claim by definition of 
the matrix norm. 

For (2), note that for all ||v|| = 1, we have 



(v, Yv) = ^ViVjYij 

< \Vi\\Vj\cXij 
id 

- (^y^.^i^j^ij 

id 

< c||X||, 

where again we have made use of the fact that || abs(v)|| = 1. 
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To show that, with probabihty at least 1 — 5, we have ||B|| < 5/4, define B = abs(B); 
by Lemma [Tor 1) it suffices to show ||B|| < 5/4. Note 




1 - 




By Lemma [7] we have with probabihty at least 1 — 5/4 that 



1 




< 5/4 



and so, noting that 



D(A) 



and using Lemma [1(71 2). the claim is proved. 

Precisely the same argument works to show that with probability at least 1 — 5/4, we 
have||L|| < 5/4, with y~^D(R)^/^ED(R)^/^, which also has norm 1, playing the role of 



Combining all the bounds shows that, with probability at least 1 — 5 we have ||J — K|| < 5, 
as desired. 



We will now use the results established so far in this section to prove a proposition that 
tightens the lower bound in Lemma [3], so that second eigenvalues become an even better 
proxy for consensus time. 

Proposition 6. Suppose (P, n) are such that, for all n, 

1. There exist A and A so that < A < A2(Q(P, n)) < A < 1. 

2. min^rifc/n > a > 0. 

(P)/rf^ax(P) > /3 > 0. 

Write T = T(A). Then, for any 5 > 0, for high enough n, with probability at least 1 — 5 



D(A)-i/2AD(A)-i/2. 



This completes the proof of the proposition. 



I 



log(l/8e) - log(l/a/3) 
21og(l/|A2(T)|) 



1 < CT(£; A). 
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Combining this with Lemma [3l we can conclude that for any 6 with probabihty at least 
1-5 



log(l/8e) - log(l/a/j) 
. 21og(l/|A2(T)|) , 



1 < CT(£; A) < 



log(l/£) 



21og(l/|A2(T)|) 



log(l/£) 



Thus, as we let e get small, we find that CT(£:; A) is proportional to i^;^/\x2{t)\)- 

The assumptions of this proposition could be weakened, but in their current form they 
are simple to state and interpret. The first one says that the second eigenvalue of Q(P,n) 
should have a magnitude that stays away from and 1, and amounts to requiring that 
consensus time is not going to or oo. The other conditions impose some balance on the 
system. The second one says that no group should be getting negligibly small relative to 
society. The third one says that maximum and minimum degrees should not get too far 
apart proportionally. Some types are allowed to be much more popular than others, but not 
infinitely so. If these conditions are met, then Lemma [3] can be strengthened so that the 
lower bound is tighter, and still easy to compute. 

Techniques similar to the ones used in the proof below can be applied to the study of 
mixing times in order to tighten the upper bound in Lemma H] in the multi-type random 
graph setting. 

Proof of Proposition [6l We will reuse the same variable names used inside the proof of 
Proposition [5], but the variables defined for the whole subsection will be unchanged. 

Write C = D(R)-^R and T = T(A). That is, C is the version of T in the "expectations" 
world. Also, let 

log(l/8e) - \og{l/aP) 
_ 21og(l/|A2(T)|) . 

There are three steps to the proof. In Step 1, we show that for C*b to converge within 2e of 
its limit takes at least z — 1 steps for some b. In Step 2, we use Proposition [5] to show that 
for any rj > 0, for high enough n, with probability at least 1 — 77, we have ||T — C|| < 77. In 
Step 3, we show that, if rj is chosen small enough, then C*b and T*b are at most e apart for 
at least z — 1 steps under the inner product (■, ■)s(a)- This shows the requisite result. 

Step 1. Let V be a right eigenvector of C corresponding to eigenvalue A2 := A2(Q) (this 
is also the second eigenvalue in magnitude of C by Fact [T]). If we multiply v by a constant 
scalar, we may assume that the entry with largest magnitude is 1/2. By Assumption 1, A2 
is nonzero. Given this and the fact that C is constant on a given type, it follows that v is 
constant on a given type. Thus, by Assumption 2, there are at least an entries in v equal 
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to 1/2. And from this it follows, by the definition of s(C) and Assumption 3, that 



{v,v),c,>na.^-j ■;^;^^>x- 

Setting 6j = f j + 1/2, we see as at the end of the proof of Lemma [3] that 

||C*b-C°°b||,(c)>^|A2(C)|^ 

which yields the lower bound on convergence time we want with C instead of T. But in 
view of Assumption 1 and Theorem [3l for high enough n we can replace C by T and lose at 
most an additive factor of 1 in the bound. 

Step 2. Recall C = D(R)-iR and T = T(A). Also put 

L = D(A)-i/2jj3(A)i/2 

and 

By Fact [U we have 

T - C = t;-^ED(A) - l^-^ED(R) + L - M. 

So by the triangle inequality, it suffices to bound ||f^^ED(A) — y^-'^ED(R)|| and ||L — M||. 
By Lemma [71 if Wmin/ log^ n is high enough, the following event occurs with probability 
at least 1 — 7 for all i simultaneously: \di — Wi\ < jWi. Call this event Ei. Given the 
assumptions, high enough n ensures the condition of the lemma is met. Thus, on Ei, 

||i;-iED(A) - 1/-iED(R)|| < 7, 

and so it suffices to take care of the other term. 

By Proposition [5], we know that if Wmin/ log^ n is high enough, then on an event E2 of 
probability at least 1 — 7 we have || J — K|| < 7. As above, for high enough n the condition 
is met. Now let 

F = D(A) - D(R), 
G = (D(R) + F)^/2-D(R)^/2^ 
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and 

H = (D(R) + F)-i/2 _ D(R)-i/2. 

Observe 

||L - M|| = ||(D(R) + F)-i/2j(D(R) + F^^^ - B{R)-^/^KB{Ry/^\\ 
= ||(D(R)-i/2 ^ H)J(D(R)i/2 + G) - D(R)-i/2j^D(R)^/l 
= ||D(R)-^/2(j _ K)D(R)i/2 ^ D(R)-i/2jG + HJD(R)^/2|| 
< ||D(R)'i/2(j _ K)D(R)i/l _^ ||D(R)-i/2jG|| + ||HJD(R)i/2|| + ||HJG|| 

Using Lemma [7] and standard series approximation arguments, for high enough n we can 
ensure ||G|| < 7||D(R)^/^|| and ||H|| < 7||D(R)~-'^/^|| on an event E3 of probabihty at least 
1 — 7. Using the fact that || J|| < 1, the Cauchy-Schwartz inequahty yields that each of the 
middle two terms above is bounded by 7. For the last term, note that 

||HJG|| < 7lD(R)i/l ■ ||D(R)-i/l = < ^. 

So it suffices to take care of the first term. This is accomplished by noticing that, on 

El n E2, 

||D(R)-V2(J - K)D(R)V2|| < ^J^P - K|| 

^ (1 + J - K|| definition of E^ 

1 + 7 

< — - — II J — K|| Assumption 3 

^ ( — definition of Eo. 

- (3 

Together, these facts show that for high enough tt,, on i?i fl £'2 fl i?3, which occurs with 
probability at least 1 — 87, we have 

By choosing 7 so that the right hand side is less than rj and 87 < r/ (to take care of the 
probability), the step is complete. 
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step 3. Write T = C + Y, where || Y|| < t]. Note that 

t-i 

(T + Y)* = T* + ^Xg, 

9=0 

where is a product of q copies of Y and t — q copies of T in some order. By the fact that 
||T|| = 1 and ||Y|| < rj, we have ||Xq|| < 7]'^ for each g > 1. Then, by the triangle inequahty, 



Thus, 



Ex. 

q=0 



< 



q=0 



•it rr\t 



T < 



1 



1 — f] 



Take b and v to be the vectors constructed in Step 1. Note that for t < z — 1 we have, 
for high enough n, 





(TV,TV),(a) 






((C* + Yi)v,(C* + Y,)v),(A) 




> 


(CV,CV),(A) + 2(Ytv,CV),(A) 




> 


(l-r^)(CV,CV),(c) 






+ 2(YiV,CV),(A) 


Lemma [7] 


< 


2(l-r7)£ + 2(YiV,CV),(A) 


definition of z 


< 


2(l-77)£-2||Yjv||,(A)- ||CV||,(A) 


Cauchy-Schwartz 


< 


2(l-r7)£-2||Ytv||,(A) 


see below 


< 


2(1 2||Yjv|| 


def'n of II ■ ||s(A) 


< 


2{1 -r])6 -27]. 





The step whose explanation is missing is straightforward; no entries in v have magnitude 
exceeding 1/2 and multiplication by the stochastic matrix C preserves this property. Since 
s(C) is a probability distribution, ||C*v||s(c) < 1 holds by definition of the inner product. If 
7] is chosen so that 2(1 — 7])e — 27] > e, then the proof is complete. I 
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7.2.3 Results on DWH and EDWH 



Proof of Proposition [TJ We will construct a v satisfying (v, e)s = so that the absolute 
value of the quantity (v, Tv)s/(v, v)s is equal to | DWH(M)|. Since IA2I = max{|/32|, \Pn\}, 
this suffices by Proposition |H 
Define 

' ifieM 
1 iii^ M. 



1 

rdi 



(n~r)di 



Let D = J2i di and note 



i 

El d-i ^ — V 1 
(VZ^ ■ 7l 2^ ((n - 1 



(rdiY D ^ ((n-r)di)^ D 



1 

1) 



1 >^ 1 1 >^ 1 

^ di (n — rY ^ di 



Also, 



(v,Tv)s = (^'^^j^J^ 



1 



(9) 



i,jeM 



r[n — r) 



Dividing (v, Tv)s by (v, v)s, canceling D, and using the definition of W yields the result. | 

We prove Lemma [5] before Theorem [21 
Proof of Lemma [5j We show that EDWB.{m, ps, Pd) = {Ps — Pd)/^P = {Ps — Pd)/{Ps + 
(m — l)pd), which is easily checked to be equal to {H — l)/(m — 1); this, in turn, converges 
to h = H/m as m grows. Consider M consisting of k islands and M'^ consisting of m — k 



57 



islands of nodes. Then from the definition of EDWH(M; m,psiPd) it follows that 



EDWH(M;m,p„prf) 



1 1 _|_ 1 1 ' 



jMp ^i€M d ' |Af<=| 

which can be written as 

kps+k(k-l)pj I {m-k)ps + {m-k){m-k-l)pa _ cyp^ 
k^d^ {m-k)^d'2 '^d'2 

m? kn _j_ m? (m-k)n 
k^n? dm {m—k)^n^ dm 



This becomes 



Pa+(fc-l)Pd _|_ Ps+(m-fc-l)pd _ 2 
k m—k 

dm I dm '• 



or 



kn (m—k)n 
Ps-Pd 

pm ' 

which is the claimed expression. Since this holds for all M G I{n), the result follows. | 
Proof of Theorem [2j Note 

Q(P, n) = + I^, 

Ps + [m- l)pd ps + [m- l)pd 

where denotes the m-hy-m matrix of ones and 1^ denotes the m-hy-m identity matrix. 
Then, the eigenvalues of this matrix can be computed directly. The only nonzero eigenvalue 
of the first matrix is 

mpd 



Ps + {m- l)pd 

with multiplicity 1; and adding 



Ps-Pd ^ 

^m 



Ps + {m- l)pd 

just shifts all the eigenvalues by adding to them the constant multiplying the identity. Thus 
the second largest eigenvalue of Q(P,n) (after the eigenvalue 1) is 

Ps -Pd 



Ps + {m- l)pd 

Simple algebra shows that this is the same as the expression claimed in the theorem. | 

Proof of Corollary [4l Let us first show that with a probability going to 1 all nodes have 
degree between (1 — f{n))d and (1 + f{n))d for a function /(n) 0. From Lemma 2.1 in 
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Chung and Lu (2002)0 it follows that for any given i, F{di > (1 - f{n))d) > 1 - e-(^("))''^/3 
for any function f{n) < 1. The probability that all nodes have degrees at least (1 — f{n))d is 
then at least ^1 — e^'--'^*^"'-*-'^'^/'^ j . Given that d > log^(r;,), it follows that this expression is at 

which goes to 1 as long as (/(n))^ log(n)/3 goes to oo. A similar 



least I 1 



;-(/(n))-^log{n)/3 

n 

argument establishes that all nodes have degrees at most {1 + f{n))d with a probability going 
to 1. Thus, with a probability going to 1, Smin > {l^fln))n ■ 

Note, also that with a probability going to 1 that A is connected (e.g., apply Theorem 
[1] noting that h{n) is bounded away from 1 so that (iv) applies, and (i)-(iii) apply given the 
islands model and d > log^(n)). Thus, we can also apply Lemmas [3] and HI to conclude that 
with a probability going to 1 



log(nV47) - log((l + f{n))n/{l - f{n))) 



-21og(|A2(T(A(P,n)))|) 



< CT(7/n2; A) < 



log(nV7) 



-21og(|A2(T(A(P,n)))|) 



and 



log( 



< MT(7/n; A(P,n)) < 



log(^) + log((l + f{n))n/{l - f{n)))/2 



-log(|A2(T(A(P,n)))|) 

These imply that with a probability going to 1: 

log(n) - log(47) - log((l + /(n))/(l - f{n))) 
-21og(|A2(T(A(P,n)))|) 



log(|A2(T(A(P,n)))|) 



<CT(7/n2;A) 

log(n) - log(7)/2 



(10) 



< 



log(|A2(T(A(P,n)))|) 



and 



log(w) - log(27)) 
log(|A2(T(A(P,n)))|) 



< MT(7/r2; A(P,n)) 



(11) 



< 



log(n) - log(27) + log((l + f{n))/{l - f{n)))/2 



-log(|A2(T(A(P,n)))|) 



Next, applying Theorem [2] and Lemma [5], 



A2(T(A(P,n)))- 



Hin) - 1 



171(71] 



1 



0. 



(12) 



^"'^Set the Xi's in their lemma to be the reahzation of the hnks that a given node might have to other 
nodes. 
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Since, H(n) = h{n)m{n) it follows that = ^^^^^j"-*^ ^ is bounded away from 1. Thus, 

from (fT2l) . we deduce that for any 1 > 5 > 0, with a probability going to 1 

1-^ ^ 1 ^ 1 + 5 

-log(ig5i) - -log(|A.(T(A(P,n)))|) - -log(^)- 

The corollary then follows from ffTOj) and ffTTl) . noting that /(n) — > 0. I 
Proof of Theorem [5j For the left hand side, apply Theorem [3] and then compute the 
second eigenvalue of 

flPs f2Pd 
Q(P) = hPs+f2Pd flPs+f2Pd 

^ flPd f2Ps ' 

. flPd+f2Ps flPd+f2Ps . 

(the result appears in Jackson (2008), Section 8.3.6). For the right hand side, first use the 
definition of DWH; then apply Lemma [7] to show the degrees in the denominators in the 
DWH formula are arbitrarily close to their expectations; then use the strong law of large 
numbers to conclude that the ratios appearing in the formula converge to their expectations. 
I 

7.3 Results and Proofs for the Empirical Analysis 
7.3.1 Misidentifying Islands and AfRne Bias 

To justify including an intercept in our regressions, consider the following stylized elaboration 
of the islands model. We have m equally sized islands A^i, . . . , Nm and each of those islands 
k is divided into r equally sized sub-islands Nki, . . . , N^r- If i and j are in different islands, 
then the linking probability between them is pd- If i and j are in the same island, then the 
linking probability depends on whether they are in the same sub-island, or different sub- 
islands. If they are in the same sub-island, then they are linked with probability Ps- And if 
they are in the same island but different sub-islands, then they are linked with probability 
Pb- We assume < pd < Pb ^ Ps- 

The idea is that the researcher has data on the islands but not the sub-islands. We will 
now study, in this simple setting, what happens if the homophily H is estimated as if the 
data were generated by the islands model with islands Ni, . . . , and no sub-islands. 

If, without knowing about the sub-islands, we estimate the probability of same-type 
nodes being linked, we are actually estimating the quantity 

„ _Ps + {r - l)pb 
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and our estimate of H, the unnormalized homophily, will be 



mp 

where p will be estimated correctly by its sample analogue of link density. 

In this setting, it is not valid to apply Corollary [3] with the predictor of the second eigen- 
value computed based on the misidentified island structure. That is, the second eigenvalue 
will not be well estimated by ^5t- Instead, the second eigenvalue of the representative agent 
matri will be an affine function of ^rj. This is the content of the following proposition. 

Proposition 7. In the modified islands setting just described, if x = ^■s the regressor 
computed without information about the sub-island structure, then 

A2(Q(P,n)) = ax + l3, 

where a and (3 depend on m and r. 

Proof of Proposition [71 Letting denote the matrix of all ones of size k and denote 
the identity matrix of size k, we find that with P specified by the description above, 

r-./r> \ PdEimr + {Pb " Pd)Im ®^r + {Ps " {Pb ~ Pd))'^mr 

Q(P,n) = 



Now, the second eigenvalue of 



Ps + {r - l)pk + (m - l)rpd 

PdEmr + {Pb - Pdjim ® 

riPb-Pd) +Ps- (Pb-Pd) 



is r{pb -pd). Thus, 

A2(Q(P,n))- . 

Ps + [r - l)pi, + (m - l)rpd 

This can be rewritten as 

A2 Q(P,n)) = L . 

[m — l)r 

Letting x = be the regressor, we find that 

o m— 1 ^ ' 

A2(Q(P, n)) = X H . 



^This, by Theorem [3] is the Umit of the second eigenvalue of the realized matrix. 
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I 

Now, in running the regressions, we do not make use of the details of the formula in 
the proof. We merely note that there is an affine bias if there is some homophily inside the 
islands on dimensions falling outside the scope of our data. Thus, including an intercept in 
the regression of convergence rates on a; is a reasonable first-order approximation to account 
for some of this affine bias. 

Of course, in more realistic settings, the various kinds of symmetry present in this model 
will not exist. However, it appears that more general formulas or characterizations could 
be obtained describing how homophilies at various levels interact. This could be a useful 
direction to pursue in taking this model to empirical settings, where there will almost always 
be some underlying homophily on dimensions not captured by the data. 

7.3.2 The Asymptotic Equivalence of DWH and EDWH 

Proof of Lemma [6l Consider the islands model with m(n) > 2 equal-sized groups, and 
probabilities of links within and across types Ps and pd, respectively, and consider any se- 
quence of groupings of islands M{n) G I{n). We show that 

|EDWH(M(n),m,p„prf) -DWH(M(n),A(P,n))| ^ 0. 

This follows from showing that \EWM{n),M{n) ~ WM{n)M{n) (A(P,n))| AO, \EWMin),Ann) - 
VrM(„)M-(n)(A(P,n))| ^ 0, and (EieM(n) dfe))/( d.(A(p,n)) ) ^ 1' ^(^) (^^^ 

that the denominator is bounded away from in the limit). 

The latter conclusion follows from the argument in the proof of Corollary H] using Lemma 
2.1 in Chung and Lu (2002) to show that with a probability going to 1 all nodes have 
degree between (1 — f{n))d{n) and (1 + f{n))d{n) for any function f{n) — > such that 
(/(n))^ log(n)/3 goes to oo. Next, given that m{n)/n which implies that the number of 
nodes in within any island i{n) = n/m{n) is growing without bound, and so we can again 
apply Lemma 2.1 in Chung and Lu (2002) @ to deduce that there is a function g{n) 
such that with a probability going to 1, 

(l-(7(n)) [fc(n)p,(n)|z(n)|Vci(n)2+p,(n)A;(n)(A;(n)-l)K(n)|Vci(n)2] < ^ T,,T,, 

< il + g{n)) [A:(n)p,(n)|z(n)|Vrf(ri)'+Prf(n)fc(n)(A;(n)-l)|2(n)|Vrf(ri)'] , 
^■^Now, we work with the Xi's in their lemma to be the realization of the links within a given M(n). 
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where k{n) is the number of islands in M(n) and similarly 

{l-g{n))p,{n)\M{n)\\M'^{n)\/d{nf < J] T.^T,, < {l+g{n))pa{n)\M{n)\\M%n)\/d{nf. 

ieMjeM 

These imply that 

\EWm (n),M(n) 

and 

\EWM{n),M''{n) ~ W^M(n)M'=(n) (A(P, n)) | ^ 0, 

as claimed. I 
7.4 Results of the Empirical Analysis for Mixing Time 

Table 6: Dependent variable = p(MT(0.1/n; A)) 





= 82) 


Variable 


Coefficient 




(^-statistic) 


Intercept 


0.861 




(66.9) 


^-j for "type" 


0.287 




(5.23) 




0.255 



Table 7: Dependent variable = p(MT(0.1/n; A)) 

{N = 82) 



Variable Coefficient 
(t-statistic) 



Intercept 


0.825 




(34.6) 


for grade 

m— 1 ° 


0.163 




(4.24) 



R2 0.181 
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Table 8: Dependent variable = Imputed convergence rate, /9(MT(0.1/n; A)) 

(A^ = 82) 



V CXL ICHLIlt; 




CoefRcient 
(t-statistic) 








All Homophilies 


Grade Only Gender Only 


Race Only 


Intercept 


0.665 


0.735 


0.884 


0.902 




(22.4) 


(23.3) 


(60.2) 


(95.6) 


Grade DWH 


0.309 


0.284 








(7.26) 


(6.05) 






Gender DWH 


0.100 




0.183 






(2.02) 




(2.84) 




Race DWH 


0.104 






0.0696 




(5.23) 






(2.65) 


R2 


0.511 


0.314 


0.0917 


0.0807 




Figure 8: Rate of convergence of mixing time for the 82 friendship networks plotted against 
the degree weighted homophily in each of the networks calculated relative to grade. 
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Figure 9: The relationship between mixing time and the prediction of Theorem [2] for grade.. 
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